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ABSTRACT 

We investigate the effects of potential sources of systematic error on the angular and 

photometric rcdshift, Zp^ot, distributions of a sample of rcdshift 0.4 < z < 0.7 massive 
galaxies whose selection matches that of the Baryon Oscillation Spectroscopic Sur- 
vey (BOSS) constant mass sample. UtiHzing over 112,778 BOSS spectra as a training 
sample, we produce a photometric redshift catalog for the galaxies in the SDSS DR8 
imaging area that, after masking, covers nearly one quarter of the sky (9,913 square 
degrees). Wc investigate fluctuations in the number density of objects in this sample 
as a function of Galactic extinction, seeing, stellar density, sky background, airmass, 
photometric offset, and North/South Galactic hemisphere. We find that the presence 
of stars of comparable magnitudes to our galaxies (which are not traditionally masked) 
effectively remove area. Failing to correct for such stars can produce systematic errors 
on the measured angular auto-correlation function, w{9), that are larger than its sta- 
tistical uncertainty. We describe how one can effectively mask for the presence of the 
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stars, without removing any galaxies from the sample, and minimize the systematic 
error. Additionally, we apply two separate methods that can be used to correct the 
systematic errors imparted by any parameter that can be turned into a map on the 
sky. We find that failing to properly account for varying sky background introduces 
a systematic error on w{6). Wo measure w{9), in four Zphot slices of width 0.05 be- 
tween 0.45 < Zphot < 0.65 and find that the measurements, after correcting for the 
systematic effects of stars and sky background, are generally consistent with a generic 
ACDM model, at scales up to 60°. At scales greater than 3° and Zphot > 0.5, the 
magnitude of the corrections we apply are greater than the statistical uncertainty in 
w{9). The photometric redshift catalog we produce will be made publicly available at 
http : / /port al . nersc . gov / pro j ect /boss / galaxy /photoz/ . 

Key words: Galaxies - clustering. 



1 INTRODUCTION 

Wide-field, multi-band imaging surveys provide photometric 

redshift estimates for many millions of galaxies. Photomet- 
ric redshifts, Zphot, are easier to obtain than spectroscopic 
ones, Zspec, but the gain in numbers of objects is countered 
by redshift uncertainties, cr^, that are rarely better than 
(7z = 0.03(1 -I- z). Such photometric redshift surveys may 
bo referred to as having "2 + 1" dimensions nearly all 
of the radial clustering information is lost, but the redshift 
errors are small enough to allow two dimensional clustering 
measurements in redshift shells of width similar to cr^ . This 
strategy has been utilized to explore the formation and evo- 
lution of galaxies (sec, e.g., Blake ct al. 2008; McCracken 
et al. 2008; Ross et al. 2009, 2010a,b; Wake et al. 2011) 
and quasars (e.g., Myers et al. 2006) and also to measure 
cosmological parameters (see, e.g., Blake et al. 2007; Pad- 
manabhan et al. 2007; Ross et al. 2008; Thomas et al. 2010a, 
2011; Croccc ct al. 2011). Such studios are gaining in impor- 
tance, as future surveys such as The Dark Energy Survey^ 
(DES), The Large Synoptic Survey Telescope^ (LSST), and 
the Panoramic Survey Telescope & Rapid Response System'' 
(Pan-STARRS) will rely primarily on photometric redshifts. 

At the largest scales, the accuracy of clustering mea- 
surements is not critically dependent on the uncertainty of 
each Zphot- However, contaminants or incorrect calibrations 
do matter at these scales — the predicted clustering ampli- 
tudes are negligible; systematic errors can cause small fluc- 
tuations and thus non-zero amplitudes. Studies (see, e.g., 
Sawangwit et al. 2009; Thomas et al. 2010b) have found ap- 
parent excesses in the clustering strength at scales larger 
than 100 h^^Mpc. Thorough studies arc thus necessary to 
determine any potential sources of systematic error that 
could cause spurious fluctuations in the galaxy density field. 

In this paper, we investigate the observational realities 
that may cause fluctuations in the observed density of galax- 
ies when modelled incorrectly. These include stellar contam- 
ination and masking. Galactic extinction, sky brightness, 
seeing, airmass, and offsets in photometric calibration. The 
effect of stellar contamination in a galaxy sample is well doc- 
umented (see, e.g. Myers et al. 2006; Thomas et al. 2010b; 
Crocce et al. 2011). Stars may also cause a systematic effect 

^ http://www.darkenergysurvey.org 

^ http://www.lsst.org/lsst 

^ http : // pan-starrs. ifa. hawaii . edu/ public 



on the number density of objects by occulting a small frac- 
tion of the sky. This area is on the order of 1 millionth of 
a square degree per star, but with tens of millions of stars, 
becomes substantial given the precision to which clustering 
measurements can now be made. 

Galactic extinction requires that magnitudes be cor- 
rected for the effect of dust in our Galaxy. It has been noted 
several times (e.g., Scranton et al. 2002; Myers et al. 2006; 
Ross et al. 2006; Ho et al. 2008; Wang & Brunner in prep.) 
that errors in this correction may cause a systematic ef- 
fect on the galaxy density field, as the effective depth of a 
survey would fluctuate. Further, constant (extinction cor- 
rected) magnitudes have different fluxes (since the flux is 
directly related to the magnitude before extinction correc- 
tions). This implies that the expected magnitude error will 
vary as a function of the Galactic extinction. Airmass has 
a similar effect — this simply refers to the path length of 
the photons through our atmosphere to the telescope, nor- 
malized to unity for observations at the zenith where it is 
minimized. At higher airmass less photons reach the detec- 
tor because more are scattered/absorbed in the atmosphere 
and thus the error on a measured magnitude will be related 
to the airmass. Finally, the observed flux of an object is more 
spread out at higher seeing — this increases the magnitude 
error and makes it more difficult to distinguish between stars 
and galaxies. Either of these seeing-dependent effects could 
cause spurious fluctuations in the observed density of galax- 
ies. 

We use data from the Sloan Digital Sky Survey (SDSS; 
York et al. 2000) eighth data release (DR8; Aihara et al. 
2011) to identify and remove potential sources of system- 
atic error on the angular clustering of objects selected to 
be luminous galaxies (LGs) with redshifts 0.4 < z < 0.7. 
Section 2 presents the data we use for our photometric red- 
shift catalog and the spectroscopic data we use to train the 
photometric redshifts we generate. Section 3 describes how 
we measure and model angular correlation functions. In Sec- 
tion 4 we investigate the fluctuations wc find in the observed 
number density of LGs as a function of observational param- 
eters and correct for the systematic errors these variations 
may impart. Section 5 explains how we train the photomet- 
ric redshifts, the potential systematic effects we consider for 
this training, and the resulting photometric redshift catalog 
that we generate. In Section 6, we present measurements 
of angular auto- and cross-correlation functions in slices of 
width Azphot = 0.05, test their consistency with a generic 
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ACDM model, and determine how the galaxy bias we cal- 
culate changes depending on the corrections we apply. We 
conclude with a summary of our results and a discussion of 
its greater implications in Section 7. 



2 DATA 

We use imaging data from the SDSS DR8 (Aihara et al. 
2011) to create a photometric redshift catalog of galaxies. 
This survey obtained wide-field CCD photometry (Gunn et 
al. 1998, 2006) in five passbands {u, g,r,i, z; e.g., Fukugita 
et al. 1996), amassing a total footprint of 14,555 deg^ data 
for which object detection is reliable to r ~ 22 (Aihara et 
al. 2011). 

We use spectroscopy from the SDSS III Baryon Oscilla- 
tion Spectroscopic Survey (BOSS;Eiscnstcin ct al. 2011) to 
obtain the spectroscopic rcdshifts, Zspec, we use as a train- 
ing sample for our photometric redshift catalog. BOSS is 
a spectroscopic survey that will target 1.5 million massive 
galaxies, 150,000 quasars, and over 75,000 ancillary targets 
over an area of 10,000 sq. degrees (Eisonstcin et al. 2011). 
BOSS observations began in 2009, and the last data will be 
acquired in 2014. The BOSS spectrographs (R = 1300-3000) 
are fed by 1000 fibres in a single pointing, each with a 2" 
aperture. Each observation is performed in a series of 15- 
minutcs exposures and integrated until a fiducial minimum 
signal-to-noise is reached. This insures an isotropic sample, 
complete to high redshift {z ~ 0.7), resulting in a redshift 
completeness of ~ 97% over the full imaging footprint. 



2.1 Selecting Imaging and Redshift Data 

Our photometric catalog has the same selection as the sam- 
ple of BOSS targets chosen to have approximately constant 
stellar mass, denoted 'CMASS', as described by Eisenstein 
et al. (2011). We select objects from the Catalog Archive 
Server (CAS) PhotoPrimary table* identified as galaxies. 
We designate the subscript rnod to denote the SDSS ubor- 
calibrated model magnitudes (Padmanabhan et al. 2008). 
The subscript cmod denotes cmodel magnitudes, where the 
cmodel flux, ^cmod, is defined as: 

^cmod = fpsf^dev + *exj)(l-0 — fpsf) (1) 

where $ is the flux, subscripts dev and exp refer to the best-flt 
DcVaucouleurs and exponential proflles, and fpsf is fraction 
of the flux within the PSF. The CMASS selection is then 



defined by: 

17.5 < icmod < 19.9 (2) 

d± > 0.55 (4) 

i/x6er2 < 21.7 (5) 

icmod < 19.86 + 1.6(d± - 0.8) (6) 



* see http://skyserver.sdss.org/dr8/en/help/browser/browser.asp 
for descriptions of the data contained within this table 



where all magnitudes are corrected for Galactic extinction, 
ifiber2 is z-band magnitude within a 2" aperture^, and 



d± = rmod - imod - {g-mod - rmod)/8.0. (7) 

Stars are further separated from galaxies by only keep- 
ing objects with 

ipsf - imod > 0.2 + 0.2(20.0 - i^od) (8) 

Zpaf — Zmod > 9.125 — OAGZmod (9) 

unless the object passes a 'LOWZ' cut defined by 

r-cmod < 13.6-Fc||/0.3 (10) 

|cx|<0.2 (11) 

16 < Vcmod < 19.6 (12) 
where 

C|| = 0.7{gmod - rmod) + 1.2{rmod - imod - 0.18) (13) 
and 

C± = Vmod - imod " (ffmod - ''■•mod)/4.0 - 0.18. (14) 



These target selection criteria produce a sample of just 
over 1.6 million objects, occupying over 11,000 square de- 
grees of area on the sky. We refer to these objects as 'LG' 
(for luminous galaxy). We cut this sample down to the 
main SDSS imaging area. We define this area as the data 
contained in HEALPix (Gorski et al. 2005) pixels at Ngije 
= 1024 (this resolution breaks the full-sky into 12,582,912 
equal area pixels). Each pixel is assigned a weight given its 
overlap with the imaging footprint (accounting for the area 
taken up by bright stars), and we include only pixels with 
weight at least 0.9. This process is described in detail in Ho 
et al. (2011). Further, we only use data with seeing (defined 
by the r-band psf-FWHM) loss than 2 .0 and Galactic ex- 
tinction, E{B — V)< 0.08, as determined via the dust maps 
of Schlegel, Finkbeiner & Davis (1998). These cuts remove 
a large fraction of the data, leaving only 1,065,823 objects. 
Their footprint is displayed in Fig. 1: 9,913 square degrees, 
2,554 of which arc in the southern stripes. A total of 282,687 
of the objects are in the southern stripes, meaning their 
number density is 110.7 deg~^, while the number density in 
the north is 107.1 deg"'^ — a 3.4% difference. 

We match the masked LGs to the BOSS CMASS spec- 
tra that had been observed and run through the spectro- 
scopic pipeline up to the 11*'' of November, 2010. This 
yielded a sample of 112,778 spectroscopic redshifts that are 
used to estimate photometric redshifts for our full sample. 
We find that 3.7% of these spectroscopic objects are either 
stars or quasars (2.7% stars and 1% quasars). The percent- 
age of quasars should be roughly constant across the sky, 
but the stellar contamination will be highly dependent on 
the proximity to the Galactic disk and center. Wc find that 
the percentage of stars varies from 6% at Galactic latitude 
6 = 25 to 1% at 6 = 85. Masters et al. (2011) find 2+/- 
1% point source contamination by inspecting high resolution 
Hubble Space Telescope images of BOSS CMASS targets in 
the COSMOS survey field (at b = 42°). 

^ We use an ifiber2 limit of 21.7; although the limit has changed 
to 21.5 in current BOSS targeting, the limit was ^ 21.7 for all of 
the BOSS spectra in this study 
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Figure 1. The density of objects in our catalog, in equatorial coordinates (left panel) and Galactic coordinates (right panel), after 
masking for imaging area, seeing, Galactic extinction, and bright stars. This masked footprint occupies 9,913 square degrees. The density 
increases from blue to red, with blue representing a density that is less than 40% of the average and red representing a density that is 
120% greater than average. 



We also use stars to investigate systematic effects. We 
select objects from PhotoPrimary that are identified as stars 
and have 17.5 < imod < 20.5. In total, this is over 84 mil- 
lion objects, but only 33 million reside within our masked 
footprint. 

2.2 Star/Galaxy Probability 

For the sample of BOSS spectra we use, 3.7% of objects that 
are targeted as CMASS galaxies are spectroscopically classi- 
fied as either stars or quasars. We use the software package 
ANNz*^ (Firth et al. 2003) to identify stars, as m previous 
studies (e.g., Colhster et al. 2007). Assigning galaxies a value 
of 1 and stars/quasars a value of zero, we divide our spectro- 
scopic sample such that one quarter are placed into a train- 
ing set, another one quarter into a validation set, and the re- 
maining half into a testing set. We then train ANNz in order 
to classify galaxies, using the five SDSS model magnitudes 
and the parameters icmod, ipsf ,ifiber2, iexp, Rpet,i, Rdev,i, 

Rexp.i^ AJ3dev,i^ ^Bexp,i: ln_Lstar, luZ/gxp, luZ/rfe^; , where R 

is the radius, AB is the axis ratio, subscript pet refers to 
the best-fit Petrosian profile, and 'InL' stands the natural 
log of the likelihood. The values ANNz returns, which we 
denote 'Psg', are predominantly between and 1, and when 
they are outside of these bounds, we set them to and 1, 
respectively. (This affects only 1% of the objects and does 
not bias the over-all probability distribution.) 

We find that the star/galaxy training also does a good 

^ http:/ /www. homepages. ucl.ac.uk/^ucapola/annz. html 



job of estimating the probability that an object is a galaxy. 
Fig. 2 displays the fraction of objects that are galaxies 
i'Pgai', as determined from their spectra) versus the value 
of the star/galaxy parameter, Psg. These two quantities are 
nearly identical, as can be seen by comparing to the dashed 
line. This implies we can treat psg as the probability that an 
object is a galaxy, allowing us to remove most of the effect of 
the stellar contamination. We note that the Psg estimation 
benefits greatly from having a large training set distributed 
over 21.3° < |fe| < 83.5° (less than 4% of the objects in our 
catalog lie outside these bounds). Unless otherwise noted, 
throughout we will be counting LGs by summing their val- 
ues of Psg. For our full (masked) sample, the sum of Pag is 
1,021,885, suggesting that 4.1% of the objects are stars or 
quasars. 



3 MEASURING AND MODELING 
CORRELATION FUNCTIONS 

The primary focus of this work is to determine how system- 
atics may affect the measured clustering signal. Therefore, 
we measure angular auto- and cross-correlation functions, 
w{8), of the density fields of LGs and of potential system- 
atics. These statistics can be calculated extremely quickly^ 
and can be compared to cosmological models that are well 
tested by simulation. 



^ Approximately 15 minutes total processing time using a single 
2.53 GHz processor 
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where w{d) is the measurement over the entire area and 
Wi{6) is the measurement when the ith jack-knife region is 
removed. 

We calculate the covariance matrix that we use to com- 
pare our w{9) measurements to theoretical models by trans- 
forming theoretical P{k), to angular power-spectra, and 
from the covariance of the angular power spectra to the w{9) 
covariance, C{6i,62)- The methods for doing this are out- 
lined in, e.g., Crocce et al. (2010b) and Ross et al. (2011). 
Specifically, C{6i, 62) is given by Eqs. 13 - 16 of Ross ct al. 
(2011), using a linear P{k). Crocce et al. (2010b) have ex- 
tensively tested these errors against mock catalogs, finding 
them to accurately reproduce the covariance in w{0) at lin- 
ear scales, and we restrict this study to those scales. Further, 
at the large scales, estimations of the off-diagonal elements 
of the covariance matrix constructed using jack-knife meth- 
ods have large statistical uncertainty and can therefore lead 
to unstable covariance matrices. Thus, we compare our mea- 
sured w{9) to the model w{9) and C{9i, 62) in order to find 
the best-fit model. In Section 6, we compare the theoretical 
and jack-knife uncertainties. 



Figure 2. The fraction of objects that arc galaxies versus the 
value of the star/galaxy parameter (psg)- The dashed line displays 
the relationship Pgaiaxy = Psg- Errors are Poisson. 



3.1 Estimating w(6) and its Covariance 

We calculate w{9) using Healpix pixels with Nsidc ~ 256. 
Our mask has removed all data in pixels with weight < 0.9 
at the Nside = 1024 resolution. Thus, given that the weights 
themselves are a good approximation of the fractional area 
of a pixel, the area per pixel for Ngidc = 256 is well approxi- 
mated by its weight, wt, multiplied by the area of the pixel. 
Therefore the over-density in pixel i, 5i, is given by 



: ni/{nwti) 



(15) 



where rii is the number of galaxies in pixel i. For LGs and 

stars, n — 'Y^Ui/'^wti, while for observational parameters 
(such as Galactic extinction), n — AiWti/ "^wti, where 
Ai would be the average value of the observational param- 
eter in pixel i. The correlation function, w{9), is given by 
(see, e.g., Scranton et al. 2002) 



AO) 



J2..St5'^Q.,,i9)wUwtj 



(16) 



where G, ^ is equal to 1 if pixel i is at an angular distance 

9 ± /S.9 from pixel j and zero otherwise, a = h represents 
an auto-correlation function, and a b represents a cross- 
correlation function. 

In order to compare w(9) measurements, we calculate 
jack-knife errors, (Tjack. We use 20 equal area jack-knife re- 
gions. This is accomplished by selecting contiguous regions 
of Healpix pixels whose weights sum to l/20th of the total. 
The jack-knife errors arc defined by (see, e.g., Scranton et 
al. 2002; Myers et al. 2007; Ross et al. 2007) 



'rUiO) = ^Y.M9)-wm' 



(17) 



3.2 Modeling 

We compare our measurements to theory, assuming a flat 
ACDM cosmology with h = 0.7, Qm = 0.274, = Qb/^ni. = 
0.18, n = 0.95, and as = 0.8 (as used in White et al. 2011), 
and a CMB temperature of 2.725 K. We calculate the z = 
linear power-spectrum using CAMB** (Lewis et al. 2000). 
We account for the effects of structure growth via (see, e.g., 
Seo & Eisenstein 2005; Crocce & Scoccimarro 2006): 



P{k,z) = D{zfP{k,0)e-^'""''"^''^^^ 



(18) 



where D{z) is the linear growth rate; the exponential term 

accounts for the damping imparted by large-scale velocity 
flows. We derive Sb = 5.27/1^^^ Mpc using the Zel'dovich 
approximation (Eisenstein et al. 2007) for our fiducial cos- 
mology. Such an approximation has been shown to be a good 
fit to the BAO feature in recent N-body simulation results 
(see, e.g., Seo et al. 2010). ^ 

Given P{k,z), we Fourier transform to determine 
the isotropic 3-dimensional real-space correlation function 
(,Hn{r), (which is implicitly dependent on redshift). We then 
incorporate the effects of mode-coupling (see, e.g., Crocce & 
Scoccimarro 2008; Sanchez et al. 2008; Crocce et al. 2010b) 
via 



C(r)=&u(r) + A„4,^(r)CL(r), 
where ^[^^ is the derivative of £,iin, 



(1) 



27r2 



P(k,z)ji{kr)kdk, 



(19) 



(20) 



and for Amc we use the value of 1.55 determined by Crocce 
et al. (2010b). 

We model the redshift-space correlation function as 
(Hamilton 1992) 



see camb.info 

^ This damping scale corresponds to 7A5h~^ Mpc, using the con- 
vention used in Eisenstein et al. (2007) . 
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e{p,r) = Ur)Poi^l)+Hr)P2{^l)+Ur)P4{^l), (21) 
where 

,9 1 „ 

(22) 
(23) 



Co(r) = (b' + -bf + -f)ar), 



6(r) = qbf+p'mr)-^'{r)], 

U{r) = ^f[m + l^'{r)-l^"{r)], (24) 
Pi are the Lcgcndre polynomials, 

^' = 3r-^ / ^{r'){r'fdr' (25) 
Jo 

C" = 5r-« /" e(r')(rrrfr', (26) 
Jo 

b is the large-scale bias of the galaxy population being con- 
sidered, and n is the cosine of the angle to the line of sight. 
In order to calculate model w{9), we must project 5°(/i,r) 
over the radial distribution of gala:xy pairs in a particular 
sample (or samples in the case of cross-correlations). 

w{6) = J dzi j dz2rH{zi)nj{z2)(,'' [n,r{e,zi,Z2)], (27) 

where is the normalized redshift distribution of sample i 
(and i = j for the auto-correlation). The galaxy separation 
r is a function of the angular separation of the galaxies 6 
and their redshifts and Z2 (as is yu). 



3.3 Correcting Spurious Clustering 

Observational effects may cause spurious fluctuations in the 
number of observed LGs. As first derived (for large scale 
clustering) in Ho et al. (in prep), to first order, the system- 
atic effect contributed by i observational parameters on the 
observed density field is given by 



(28) 



where 6° is the over-density of galaxies we observe, 5i is 
the over-density of the systematic i, 5*g is the true over- 
density of galaxies, and assumes a linear relationship 
between the potential systematic and its effect on the ob- 
served over-density of galaxies. According to Eq. 16, w{6) = 
{SiSjQi,j{e)). Thus, 



i,j>i 



and 



(29) 



(30) 



We can measure Wg{9) (the auto-correlation function of our 
galaxy sample), w° i (the cross-correlation function of our 
galaxy sample with systematic i), and Wij{9) (the auto-/ 
cross-correlation function of the systematics) . Thus we al- 
ways have i + 1 equations and unknowns (cj and Wg(S)) and 
we can therefore solve for wl{6). We present the solutions 
for three systematics in Appendix B. We note that measur- 
ing the cross-correlation between observational parameters 
and galaxies in order to identify potential systematic errors 
has been applied to SDSS data since its early-data release 
(Scranton et al. 2002). 



4 SYSTEMATIC EFFECTS ON THE 

ANGULAR DISTRIBUTION OF GALAXIES 

Effects such as seeing. Galactic extinction, and sky bright- 
ness may affect the number density of galaxies we select. An- 
other important consideration is the presence of foreground 
objects. As shown in Fig. 4 of Aihara ot al. (2011), the pres- 
ence of foreground objects has a significant effect on the 
number density of background objects one is able to detect. 



4.1 Foreground Stars 

In order to investigate the effect of foreground stars on the 
observed density of LGs, wc determine the number density 
of LGs in the immediate vicinity of stars within our masked 
footprint. In annuli of width 1" around each star, we deter- 
mine the number density of LGs as a function of the max- 
imum radius of the annulus. In the top left panel Fig. 3, 
we present this measurement when considering stars with 
17.5 < imod < 19-9 and dividing the LG sample into five 
icmod bins. We find that the presence of a star has a signifi- 
cant effect on the ability to observe LGs with icmod > 18.5 
out to at least 10" (we note that there arc only 22,000 LGs 
with icmod < 18.5, making the results in this bin relatively 
uncertain). The effect remains nearly constant as a function 
of the icmod magnitude, but it is strongest for the faintest 
sample (displayed in orange). 

We also find that for i < 20.1, the j-band magnitude 
of the star does not produce a strong effect. This is shown 
in the top-right panel of Fig. 3, where we take the full LG 
sample and find the number density of these objects around 
six separate samples of stars that wo have divided, based on 
model magnitudes, into bins between 19 < i < 20.5. We find 
similar results for the bins with 19 < i < 20.1 (the black, 
red, blue, and green points and lines). The effect becomes 
significantly weaker for stars with 20.1 < i < 20.3. For the 
20.3 < i < 20.5 bin, the effect is removed when the annuli 
have outer radii at least 5". When the outer radii are 3" and 
4", there is an excess of LGs. It is possible that this excess is 
caused by (resolved) binary stars — many stars are members 
of binary systems and thus there is an enhanced likelihood 
that an object within a few arcseconds of a star is also a 
star, and therefore the stellar contamination rate in our LG 
sample will be higher around stars. 

We find a deficit of LGs around stars with imod < 20.3 
to at least 10" from the star. This suggests that the extended 
seeing disc of a star increases the sky noise in its vicinity 
and therefore makes object detection less likely. This implies 
that the effect should depend both on the seeing during the 
observation and the surface brightness of the object that 
might be detected. In the bottom-left panel of Fig. 3, we 
use the full LG sample and stars with 17.5 < i < 19.9 and 
divide the imaging area into six regions based on seeing (we 
note the median seeing in our masked footprint is 1".07). 
As expected, we find that the deficit of LGs close to stars 
becomes greater as the seeing becomes more poor. However, 
there is still a significant deficit of LGs close to stars at all 
levels of seeing. 

In the bottom-right panel of Fig. 3, we find the rmmbcr 
density of LGs around stars with 17.5 < i < 19.9 when we 
divide the LGs into six bins based on the ifib2 magnitude 
of the LG. We find dramatic differences, as we find a large 
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Figure 3. The number density of galaxies we select around stars with 17.5 < imod < 19.9 (unless otherwise noted), divided by the 
average number density, in 1" wide annuli, plotted against the maximum radius of the annulus. The top-left panel displays the results 
when we divide the galaxy sample into the noted icmod magnitude bins. The top-right panel displays the case where we use the full 
galaxy sample and find the number densities around stars within the noted i-band magnitude bins. In the bottom left panel, we display 
the result when restrict to imaging regions with the labeled seeing. In the bottom-right panel, we divide the LGs into bins based on their 
j-band magnitude within a 2" aperture (if 0,2)- Errors are Poisson. 



excess for the brightest LGs (black, ifib2 < 20.5) and the 
largest deficit for the faintest LGs (magenta, ifib2 > 21.5). 
The ifib2 magnitudes are a measure of the flux within a 
constant aperture, and are therefore a measure of the surface 
brightness of the LG. Thus, as expected, we find that the 
presence of a star has the greatest effect on the objects with 
the lowest surface brightness. We find a large excess of LGs 



with ifib2 < 20.5 between 2" and 5" from stars. We believe 
this excess caused by binary companions to the stars we are 
testing against, as the most compact objects will have the 
highest surface brightness (at constant imod) and are also 
most likely to be morphologically similar to stars. 

Fig. 3 suggests that every star effectively removes a 
small amount of imaging area. If this effect is not corrected 
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Figure 4. Each of the six panels is divided such that the bottom portion displays the number density, divided by the average number 
density, as a function of the observational parameter. Individual points are connected by solid lines and represented by error-bars, whose 
size are calculated assuming Poissonian statistics. Black represents equally weighted LGs (which can be both galaxies and stars) and red 
weighting each LG by the probability it is a galaxy, psg. The results plotted in blue show the scenario when we weight each LG by psg and 
subtract the effective area of stars, Astar, from each pixel for each star. Green displays the case where we change the selection criteria to 



d±^ > 0.5564 for objects in the South, weight each LG by Psg, and subtract As 



Orange represents the application of iterative weights 



to the LG density field, in an attempt to remove all of the fluctuations. The top portions of each panel display the fraction of the imaging 
area where the observational parameter is less than the value on the x-axis. F¥om top-left to bottom right, the observational parameters 
are: sky background in i-band magnitudes/arcsec'^ (sky); i-band airmass (airmass); the estimated offset in dx in magnitudes (d^ offset); 
number density of stars with 17.5 < i < 19.9 {ustar)', r-band Galactic extinction in magnitudes {Ar)\ and i-band psf fwhm in arcseconds 
(seeing). Errors are Poisson. 



for, we would expect an anti-correlation between LGs and 
stellar density. However, 4% of the objects in our catalog 
are stars, implying that, with no correction, there should 
be a positive correlation between LG and stellar density. 
The bottom left panel of Fig. 4 presents the relationship 
between LG density and the density of stars selected from 
DR8 with 17.5 < imod < 19.9. When we equally weight 
every object (black), we see a slight decrease (~3% across 



the full range) in the number of objects as the stellar den- 
sity increases. This suggests the foreground presence of stars 
(which removes objects from our catalog) dominates over the 
increase in objects we select (erroneously) as galaxies due to 
stellar contamination. If we instead weight each object by 
the probability that an object is a galaxy, pag (as estimated 
in section 2.2; red), we find a significant and monotonic de- 
crease (totalling 10%) in the number of LGs as a function of 
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stellar density. The upper portion of the bottom-left panel 
of Fig. 4 displays the fraction of our (masked) imaging area 
where the number density of stars is below natar- It shows 
that the majority of the data (60%) has Ustar < 2000 deg^, 
but there are still data (5%) with Ustar > 6000 deg^. 

Foreground stars appear to remove area from the sur- 
vey. Integrating 2tt{1 — n/nave{d))6, we can estimate the 
effective area lost per star due to the occultation effect. For 
the stars with 19.3 < i < 19.6, this yields an effective area of 
67.2 square arcseconds and thus an effective radius of 4". 6. 

Alternatively, we can assume that each star removes an 
effective area, which we denote 'Astar', and we determine 
Astar by finding the radius, ratar, which makes the values 
displayed in the bottom- left panel of Fig. 4 closest to 1. 
We find that the x^i using Poisson errors and the model 
n/fitot ~ 1, is minimized for Vstar ~ 9". 48. This Tatar is 
determined using only stars with 17.5 < i < 19.9. Fig. 3 
suggests that stars with i-band magnitudes as faint as 20.3 
have an effect, and there are an additional 6.3 million stars 
with 19.9 < i < 20.3 within our footprint. Scaling ratar to 
account for these additional stars yields an effective circular 
area of radius of 8". 44. This is still far greater than the value 
of ~ 5" we expect based on integrating 27r(l — n/nave{0))6 
given the n/nave.(d) relationships displayed in Fig. 3. Thus, 
the n/ntot{natar) relationship (displayed in the bottom-left 
panel of Fig. 4) is stronger than one might expect, suggesting 
there are additional effects due to stellar density beyond the 
occultation effect. This issue is studied in further detail in 
Ross et al. (in prep.). 

We proceed by assuming each star effectively masks an 
amount of area consistent with n/ntot{natar) = 1. For our 
full LG sample and stars 17.5 < i < 19.9, we determined 
ratar ~ 9". 48. This radius implies that stars are effectively 
removing a total area of 500 square degrees, which is 5% of 
our masked footprint. The resulting n/ntot(natar) relation- 
ship is displayed in blue in the bottom-left panel of Fig. 4. 
We note that this effective radius is likely to depend on the 
magnitudes of the LGs, so any subsets of the data are likely 
to have different Tatar- 

The relationship between galaxy density and stellar 
density is important, due to the fact that stars display signif- 
icant clustering on large angular scales (see, e.g., Myers et al. 
2006); the stars may therefore affect the measured cluster- 
ing of galaxies at large physical scales. The auto-correlation 
function, w(6), calculated as described in Section 3.1, of 
stars (with 17.5 < i < 19.9) is displayed in black trian- 
gles in the top panel of Fig. 5. The amplitudes are signifi- 
cant, and exhibit a monotonic decrease from ~0.4 at 6 = 1° 
to ~ at = 50°. The cross-correlation of the stars with 
the Pag weighted LGs, displayed in black triangles in the 
bottom panel of Fig. 5, is significant and negative and in- 
creases towards in a manner that mirrors the decrease in 
the star auto-correlation function. This implies that if it is 
unaccounted for, the presence of stars will cause systematic 
errors on the measured large-scale clustering of LGs. 

We note that foreground stars will be a problem for any 
current or future large-scale-structure survey, and the prob- 
lem will only become more significant as limiting magnitudes 
are pushed fainter and there are thus more foreground ob- 
jects that may have a masking effect. Foreground galaxies 
will cause the same problem, but will have a much smaller 
effect on the measured clustering, since at large-scales the 
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Figure 5. Top panel: The auto-correlation functions of the den- 
sity field of the stars (black triangles), r-band extinction [Ar; red 
squares), i-band sky background (sky; blue circles), i band psf 
fwhm (seeing; green open triangles), i-band airmass (orange open 
squares), and the estimated ofi'set in d± (magenta open circles). 
Bottom panel: The same observables, but cross-correlated with 
the Pag weighted LGs. In both panels, the error-bars are the esti- 
mated jack-knife errors, and the results using sky, seeing, airmass, 
and dperp have been multiplied by 10. 



angular clustering amplitudes of foreground galaxies are sig- 
nificantly smaller than those of stars. 



4.2 Observational Parameters 

We find that the number density of LGs varies with ob- 
servational parameters such as the Galactic extinction, the 
seeing, and the brightness of the background sky. In Fig. 4, 
we display the number-density, divided by the total aver- 
age number-density, of LGs as a function of the value of the 
potential systematic, for the cases where we equally weight 
each object (black), we weight each object by its value of 
Pag (red), and we weight each object by its value of pag and 
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Table 1. The median and inner/outer quartile values of observa- 
tional parameters in our masked footprint. 



parameter 


median 


inner, outer quartile 




0.082 mag 


0.053, 0.126 mag 


seeing 


l."0.7 


0."96, l."19 


sky background 


20.27 mag 


20.43, 20.11 mag/as^ 


airmass 


1.16 


1.08, 1.26 



subtract an area Astar from each pixel for every star in the 
pixel (blue). In the top portion of each panel, we display the 
fraction of imaging data occupying area which has a value 
of the potential systematic that is lower than the value on 
the X-axis, i.e., the median value of the potential system- 
atic is at farea = 0.5. Thc median and inner/outer quartile 
values, within our masked footprint, of Galactic extinction, 
seeing, sky background, and airmass are determined using 
this information and displayed in Table 1. 

The bottom middle panel of Fig. 4 plots the relation- 
ship between the number density of LGs and thc Galactic 
extinction in the r-band, Ar (we use the Ar values from the 
CAS, which are based on the Schlegel, Finkbeiner & Davis 
1998 dust maps and thc relationship Ar = 2.751E{B - V)). 
With equal weighting (black; we note these data arc nearly 
indistinguishable from those represented in blue and green), 
the number density of LGs increases slightly as a function of 
Ar ■ The Ar values and stellar densities are highly correlated 
(since they both trace the structure of our Galaxy), but this 
docs result in n/ntot{Ar) resembling the n/ntot(nstar) re- 
lationship. Interestingly, the relationship flattens when we 
weight each object by psg, but reverts to its original form 
when we additionally subtract Astar- From the top portion 
of the panel, we can sec that that the values of Ar vary 
smoothly between 0.03 < Ar < 0.2 and that its median 
value is Ar = 0.08. 

The auto-correlation function of Ar is displayed in red 
squares in the top panel of Fig. 5. The amplitudes are sig- 
nificantly non-zero and show a similar trend to the stars. In- 
terestingly, the amplitudes of the Galactic extinction w{9) 
are significantly smaller than those of the star w(^), sug- 
gesting there is more structure to the distribution of stars 
in the Galaxy than to the dust in the Galaxy. The cross- 
correlation function of Ar with the psg weighted LGs (red 
squares in the bottom panel of Fig. 5) is negative (except 
at the largest scales), but consistent with zero at a major- 
ity of scales. Interestingly the absolute values of this cross- 
correlation function are very similar in amplitude to those of 
the cross-correlation function we measure between Ar and 
the LGs when we subtract the effective area of stars — 
even though the n/ntot{Ar) relationship displays a signifi- 
cant change between the two treatments, we find no evidence 
for a systematic effect on thc measured clustering 

We also see signiflcant changes in the number density as 
a function of thc seeing. This result is shown in the bottom- 
right panel of Fig. 4. There is a 9% decrease in the LG 
density between regions with seeing 1 .2 and those with 
1 .6. We note that this is less than 25% of the imaging 
the upper quartile of seeing within our footprint is 
1 .19. The reason for the decrease in LG number density in 



poor seeing is that the star/galaxy separation cut applied 
to BOSS targeting, given by Eq. 2.1, effectively changes. 
Increasing the seeing causes the PSF and model magnitudes 
to converge. The result is that in regions of poor seeing the 
two magnitudes are more similar - not because the object is 
too point-like, but rather because the PSF is too extended 
- and the cut is more likely to reject objects. 

The green open triangles in the top panel of Fig. 5 
present the measured w(S) of the seeing. Its amplitudes are 
significantly smaller than cither stars or Ar, though it is 
non-zero. This may be unexpected, as regions of similar see- 
ing should follow the scanning pattern in the sky. However, 
in regions of sky that were imaged multiple times (roughly 
50% of the DR8 footprint), thc imaging with better seeing 
is chosen. This works to alleviate any large-scale pattern in 
the seeing within our footprint and also reduces the median 
seeing to 1".07. The cross-correlation function of the seeing 
and LGs is displayed in the bottom panel of Fig. 5. The am- 
plitudes are consistent with zero but transition from being 
negative at smaller scales to positive at larger scales. 

The sky background has a complex relationship with 
measured flux errors, and we may therefore expect the num- 
ber density of observed LGs to depend on the sky back- 
ground. The top-left panel of Fig. 4 displays a 5% increase in 
LG density between regions with a i-band sky background^'^ 
of 20.7 magnitudes/arcsec^ (mag/as^) to those with a back- 
ground of 20.5 mag/as^. This implies that the observed 
trend may be due to an increase in the average magnitude 
error scattering more objects into than out of our sample. 
However, 70% of the footprint has a sky brightness between 
20.5 and 20.0 mag/as^ (as shown in the top right panel of 
Fig. 4) and the fluctuations are only ~ 1% in this range. 

Thc auto-correlation function of the sky background is 
displayed in blue circles in the top panel of Fig. 5. It is sig- 
nificantly positive, but is only ~l/20th of that of the stars. 
Thc cross-correlation of thc sky background with thc LGs, 
displayed in the bottom panel of Fig. 5, is significantly pos- 
itive and ~ 1/lOth as large as the auto-correlation function 
of the sky background. This is the largest ratio between the 
auto- and cross-correlation functions of any of the potential 
systematics we measure. This suggests that the increase in 
LG number density between 20.7 and 20.5 mag/as^ is re- 
lated to a significantly positive cross-correlation function. 

For an object of given brightness, the number of pho- 
tons that make it to the CCD depends on the airmass. One 
may therefore also expect that the magnitude error will de- 
pend on the airmass. The top-middle panel of 4 displays the 
relationship between the number density of LGs and the air- 
mass. We do not find smooth variations — rather we find a 
sharp increase in the number density where the airmass is 
approximately 1.35 and where it greater than 1.5 and also 
a decrease where it is less than 1.05. This suggests that the 
fluctuations are not tied to the physical effect of the value of 
the airmass, but rather these specific values of thc airmass 
are correlated with other observational parameters. 

The auto-correlation function of the airmass and its 



The CAS gives sky background values, /, in terms of 
the flux unit of 'nanomaggies/arcsec^', which we convert to 
magnitudes/arcsec^, m, via m = 22.5 — 2.51og(/) as implied by 
http: / / data.sdss3.org/ datamodel/glossaj:y.html#nanomaggies 
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cross-correlation function with LGs (orange open squares 
displayed in Fig. 5) are quite similar to those of the sky back- 
ground, especially at scales greater than 4°. Further, the ge- 
ometric mean of the sky and airmass auto-correlation func- 
tions is nearly identical to their cross-correlation function. 
This suggests that the two fields are nearly fully co-variant 
in terms of the information they provide on the large-scale 
clustering of LGs. One might expect that the two would be 
related, as the airmass and, in general, the sky background 
are higher closer to the horizon. (The sky background should 
also depend on, e.g., the phase of and proximity to the Moon 
and the azimuthal angle.) However, it is only the large scale 
patterns that are similar — the local effects on the density 
field (as displayed in Fig. 4) clearly differ. 

Finally, we use the results of Schlafly et al. (2010), who 
found color offsets (caused by some combination of errors 
in the Galactic dust map and/or photometric calibration 
errors) for SDSS data based on the blue tip of the stellar 
locus, to make a map of the offset in d±. We note that ~15% 
of the imaging in the South was not available at the time 
these maps were made and that one may expect the color 
of the blue tip to vary with the average metallicity of the 
stars that are used (which will vary as a function of position 
in the Galaxy). Regardless, any fluctuations we find may be 
important. We test against the implied offset in d±, as small 
changes in d± < 0.55 cut have a large effect on the number 
of objects we select into our sample (see Section 4.4). We 
find there to be slight excess in the number of LGs we find 
at both low and high values of the offset (displayed in the 
upper-right panel of Fig. 4). The auto-correlation of the d± 
offset is significant and ~ 1/10*'' that of Ar, but its cross- 
correlation function with the LGs is consistent with zero at 
nearly all of the scales we measure (both are displayed using 
magenta open circles in Fig. 5). 

4.3 Eliminating Systematic Errors 

We have investigated six potential sources of systematic er- 
rors (foreground stars. Galactic extinction, seeing, sky back- 
ground, airmass, and photometric offsets) and we find dif- 
ferent fiuctuations in LG density associated with each. The 
auto-correlation functions of these potential systematics and 
their cross-correlation functions with the LGs suggest that 
stars have the greatest potential to cause systematic devia- 
tions in the measured clustering, and that we may have to 
worry about sky background fluctuations as well. 

There are at least three procedures one can use to cor- 
rect for these sources of potential systematic error. The first 
is to mask area of the sky based on the value of the obser- 
vational parameter. For instance, we have already masked 
areas with E{B - V) > 0.08 and seeing > 2."0. As we de- 
scribed in Section 4.1, masking may be effective for remov- 
ing the effects of foreground stars. However, for the other 
potential systematics, there will remain fluctuations in the 
number density of LGs no matter the value of the cut we 
make on the systematic. 

A second option is to find the combination of weights 
one can apply to remove the fiuctuations one finds in the LG 
number density. This application is straightforward in the 
case that the effects are uncorrelated — each galaxy would 
just be weighted by the reciprocal of the function plotted 
in the bottom panels of Fig. 4. However, we find significant 
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Figure 6. Top panel: The correction, C, calculated as described 
by Eqs. 29 and 30, to the auto-correlation of the full LG cata- 
log, when considering: stars only (black triangles); stars and sky 
background (blue circles); stars, sky background, and Galactic 
extinction in the r-band (A^; red squares); and stars, sky back- 
ground, and seeing (green open triangles). Bottom panel: points 
with error-bars (calculated by propagating the jack-knife errors 
on the auto- and cross-correlation functions) display the value 
of e (see Section 3.3) for stars (black triangles), sky background 
(blue circles), extinction (red squares), and seeing (open green 
triangles), and airmass (open orange squares). The solid lines of 
corresponding color represent the best-fit constant value of e for 
each respective systematic. 



correlation between the effects, making this process non- 
trivial. 

The correlation between possible systematics can be 
accounted for by iteratively applying the weights. For in- 
stance, one may flnd the weights based on stellar density, 
Wstarijistar), by taking the reciprocal of what is plotted in 
red squares in Fig. 4 and then find n/ntot{Ar) while ap- 
plying Wstar to the density field, the reciprocal of which 
is (the independent) WA(^r). One may then proceed like- 
wise through all potential systematics. The disadvantage to 
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Figure 7. The measured angular auto-correlation, w{6), measured by weighting each LG by psg, multiplied by 9 for our full sample of 
LGs. In the left-hand panel, the black triangles show the measurements with no corrections and the fiducial mask, the red squares mark 
the measurement when we correct for stars (as described in Section 3.3), and the blue circles display the measurement when we make 
the Astar correction. In right-hand panel, the open green triangles display the measurement we obtain when we correct for stars and sky 
background {Cgtar,sky), the open orange squares display the results when we make the Astar correction and additionally correct for sky 
background {—Astar, Csky), the open orange squares show the w{8) we measure when we apply iterative weights to the LG density field 
(Weights), and the cyan stars show the result when we apply a Cstar correction in addition the weights. The solid line displays the model 
w{d) for our assumed cosmology and 6 = 2. 



this approach is that it assumes each efltect is fully separa- 
ble (i.e., that we can express W[natar, Ar] = X[natar\Y[Ar\) 
and in the (realistic) case that this is not 100% true, the 
order with which the weights are determined will matter. 
The advantage is that this method does not require a linear 
relationship between the potential systematic and the LG 
density fields and that it is straightforward to apply to as 
many potential systematics as can be identified. The result- 
ing n/ntot{sys) relationships when we determine the weights 
iteratively, in the order Ustar, airmass, seeing, Ar, d± offset, 
sky are displayed in orange in Fig. 4. Only for Ustar does the 
effect of applying subsequent weights cause a significant de- 
viation from n/ntot{sys) = 1. From hereon, we will refer to 
the application of these weights as the "Weights" method. 

A third option is to use the cross-correlation technique 
described in Section 3.3 to estimate and eliminate the spu- 
rious signal imparted by any number of observational pa- 
rameters. We will refer to this method as the "correction" 
technique. The top panel of Fig. 6 displays the magnitude 
of the correction, calculated as described in Section 3.3, for 
different combinations of observational parameters. The cor- 
rection for the stars alone (black triangles) decreases from 
~ 10^^ to ~ 4 X 10^" from 1° to 20°. Additionally correct- 
ing for sky background (blue circles) marginally increases 
the correction at large scales but increases the correction by 
~30% at 1°. Adding seeing (red squares) or Ar (green stars) 



corrections has only a marginal effect, which is most notable 
at small scales. 

In the bottom panel of Fig. 6, the value of e (as de- 
fined in Section 3.3) is displayed as a function of angle for 
stars (black triangles), sky background (blue circles), Ar 
(red squares), seeing (open green triangles), and airmass 
(open orange squares). The correction we apply requires that 
e be constant. The solid lines display the best-fit (constant) 
value of e. As can be seen, in every case a constant value 
of e is well within the error-bars, suggesting no need for 
higher-order corrections. The values of e are slightly more 
constant for sky background than for airmass, and we have 
found that they both trace the same large-scale clustering 
pattern. For this reason, we use only the sky background 
when calculating corrections. 

Unless we note otherwise, in all cases we measure the 
angular auto-correlation function of LGs, w{0), by weight- 
ing each object by the value of Psg. This means that instead 
of counting each LG equally, a LG is counted as psg, and 
thus, at large smoothing scales, the estimated over-density 
of LGs should be the true (observed) over-density of LGs. 
In principle, this should remove the contamination of stars, 
leaving only their systematic masking effect. We display this 
Psg weighted measurement of w{9) with no corrections, mul- 
tiplied by 6, for all of the LGs within our (masked) imaging 
area with black triangles in the left-hand panel of Fig. 7. 
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The amplitudes at large scales are quite large, given that 
generic models predict w{d) ~ for 6 > 5.0. 

Subtracting Aatar for each star within a pixel (blue cir- 
cles in the left panel of Fig. 7) significantly reduces the large 
scale amplitudes of w{9). Notably, this result is virtually 
identical to what we obtain when we do not subtract Aatar 
but instead apply the correction technique when accounting 
for stars, as displayed with red squares in the left-hand panel 
of Fig. 7 (the magnitude of this correction is displayed with 
black triangles in Fig. 6). This suggests that either method 
can be used, and the approach one uses should depend on its 
ease with respect to the task at hand. Notably, the jack-knife 
errors are smaller when we subtract Astar- 

The LG w{6) with the combined correction for stars and 
sky background iCstar.aky) is displayed by the open green 
triangles in the right-hand panel of Fig. 7 — including the 
sky background correction produces a small but noticeable 
change (the measurements around 3° are closer to the black 
line), which is almost identical to subtracting Aatar and cor- 
recting for sky background at scales less than 30° (orange 
open squares; Aatar, Caky)- Including additional corrections 
for seeing and Galactic extinction produces no discernible 
change in the LG w{6). Applying the Weight method to 
the LG density field yields the measurements displayed with 
open magenta circles in the right hand panel. The results are 
very similar to the other correction techniques at scales less 
than 30°, but the measurements are slightly larger at scales 
greater than 2° than either Aatar, Caky or Catar,aky The 
data displayed in orange in Fig. 4 suggest that the Weight 
method may leave a residual dependence on stellar density. 
If we subtract the correction for stellar density we find by 
cross-correlating the weighted LG field with the stars, the 
resulting effect on the wiO) (displayed with cyan stars in 
the right-hand panel of Fig. 7) measurements is minor. The 
disagreement between the results at 9 > 30° suggests that 
significant systematic errors remain on the measurements at 
these scales. 

The black curve plotted in Fig. 7 displays the expected 
clustering for our fiducial ACDM cosmology and a bias of 2.0 
(calculated as described in Section 3.2). This model appears 
generally consistent with all of the measurements at small 
scales, but at scales greater than 2°, the un-corrected mea- 
surements are significantly greater. We note that the feature 
in the model at ~ 3.5° is due to the baryon acoustic oscil- 
lations present in our fiducial P{k). Including corrections 
for stars and sky background appears to make the measure- 
ments generally consistent with the assumed cosmological 
model, although all but one of the measurements remain 
larger than the model between 3° and 12°. 

4.4 North and South Galactic Caps 

The DR8 imaging data is separated into two distinct regions; 
in Galactic coordinates these regions can be separated into 
6 > 0° 'North' and & < 0° 'South' (see Fig. 1). The fact that 
these regions are spatially separated makes them more prone 
to calibration errors, e.g., one might expect uncertainty in 
the relative zero points of one or more bands, given the 
lack of continuous photometry connecting the two regions. 
Schlafly et al. (2010) and Schlafly & Finkbeiner (2010b) have 
estimated the level of color offset between the North and 
South in the SDSS. Schlafly & Finkbeiner (2010b) attribute 
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Figure 8. The measured angular auto-correlation functions 
(«)[9]) for the full LG sample (black triangles), LGs in the North- 
ern Galactic Cap (blue circles), LGs in the Southern Galactic 
Cap (open green triangles), and the LG sample we obtain when 
we impose the cut dj^ > 0.5564 for objects in the South Galactic 
Cap (red squares). In each case, we subtract an area Aatar for 
each star in each pixel and correct for sky brightness. 



these differences to either calibration errors or errors in the 
Galactic extinction corrections (or a combination of both). 
Using results of the 'spectrum' based method (which is less 
sensitive to changes in the metallicity of stars than the 'blue 
tip' method) listed in the second row of Table 6 of Schlafly 
& Finkbeiner (2010b), one can infer that the values of d± 
(a combination of g — r and r — i colors deflned by Eq. 7) 
that we calculate are offset by 0.0064 magnitudes between 
the North and South (the 'blue tip' method yields a simi- 
lar offset of 0.0045 magnitudes). These results suggest that, 
assuming the values in the North are the true values, we 
should subtract 0.0064 from the values we calculate in the 
South to obtain a better estimate of their d± values. 

The inferred offset in d± between the North and South 
appears small, and given the difference between the 'spec- 
trum' and 'blue tip' based methods, the uncertainty on the 
correction may be relatively large. However, it is instruc- 
tive to determine the effect a 0.0064 magnitude offset in 
d± would have on our sample. If we change the cut such 
that we accept only objects with d± > 0.5564 in the South 
(while using the flducial cut in the North), we remove 5,172 
objects from our sample. This reduces the number density 
in the South to 108.7 deg~^ — which is still 1.5% greater 
than the number density in the North. If we weight each 
object by psg when calculating the number densities (which 
should provide a better estimate of the true number den- 
sity of galaxies) the number density in the South decreases 
to 103.2 deg~^ and in the North it becomes 103.1 deg"'^. 
Any uncertainty in the corrections we have made is almost 
certainly larger than this 0.1 deg~^ difference in number 
density. The n/ntot{sys) relationships we obtain when we 
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use the dx > 0.5564 cut in the South, subtract Astar, and 
weight by Psg are displayed in green in Fig. 4. This does not 
cause major changes, but the relationships with airmass, Ar, 
and dx offset all become closer to one. 

Given the color offsets and the change in number den- 
sity they imply, we might expect differences in the clustering 
of the LGs in the North and South. Fig. 8 displays the w{Q) 
we measure when we split the LG sample into North (blue 
circles) and South (open green triangles) samples and apply 
the Astar, Caky correctiou. Indeed, the clustering is differ- 
ent in the two regions, as the measurements in the South 
are smaller than those in the North. However, at scales less 
than 30°, the w{9) of full sample (black triangles) simply 
appears to be the weighted average of the two samples. At 
the largest scales (9 > 30°), this is not the case, suggesting 
that the color offset may cause a systematic effect on the 
measurements at the largest scales (we note that the sky 
background correction may have minimized this potential 
systematic effect, confining it to these large scales). 

We have found that using the cut d± > 0.5564 in the 
South removes the asymmetry in the number density of LGs 
in the North and South. We therefore measure 'w{9) of the 
LG sample after making this cut, which is plotted with red 
squares in Fig. 8. The result is quite similar to the result 
obtained using the fiducial cuts (black triangles), but the 
amplitudes at the largest scales are reduced and appear to 
be closer to the weighted average of the North and South. In- 
terestingly, the size of the sky brightness correction depends 
strongly on our particular treatment of the North and South. 
We find esky = 0.113 for the full sample, tsky = 0.068 when 
we use the dx > 0.5564 cut in the South, tsky = 0.027 in the 
North sample, and that esky ~ 0.18 for the South sample. 
This suggests that the systematic effect of sky brightness 
predominantly a feature of the Southern imaging. 



4.5 Summary of Angular Fluctuations 

The results presented throughout this section suggest that 
stars cause major systematic errors on the clustering of 
SDSS DR8 LGs, and sky brightness may also cause signifi- 
cant errors. We have investigated variations in number den- 
sity caused by Galactic extinction, seeing, airmass, and color 
offsets, but found them to have minor effects. Perhaps most 
importantly, we have identified two separate ways to correct 
for systematic variations in the number density of galaxies 
caused by any potential systematic that can be quantified 
and turned into a map on the sky, and for the stars we have 
identified three separate ways to correct for their systematic 
effects. 

We note that other catalogs constructed from other 
imaging surveys, other SDSS data, or even subsets of the 
LG data will not necessarily display the same relationships 
we have found in this section. The tests we have performed 
must be repeated for any sample one uses to measure clus- 
tering. We further note that the systematics we investigate 
are by no means a complete list — there are likely to be 
effects we have not thought of. 
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Figure 9. The normalized redshift distributions of BOSS CMASS 
spectra when splitting into regions with r-band Galactic extinc- 
tion, {Ar), = 0.08 (red) and 0.13 (black) 



5 CONSTRUCTING THE PHOTOMETRIC 
REDSHIFT CATALOG 

We measure photometric redshifts for our LG sample using 
the artificial neural network based photometric redshift es- 
timator ANNz (Firth et al. 2003). ANNz has been proven to 
yield accurate and precise Zphot estimates when the training 
sample is representative of the full data set (see, e.g.. Col- 
lister et al. 2007). The results of Abdalla et al. (2008) and 
Thomas et al. (2011) suggest that neural-network based pho- 
tometric redshift estimators (such as ANNz) are the most 
accurate in this specific situation. Our training sample con- 
sists of 112,778 BOSS CMASS objects with spectroscopic 
redshifts. This large training sample provides unprecedented 
ability to ensure the training sample is representative of our 
imaging data, while accounting for fluctuations in observing 
conditions. 

Fig. 9 displays the normalized redshift distributions for 
data with (Ar) = 0.08 and (Ar) = 0.13 (which splits the 
sample ~ in half). There is a significant difference, as the 
objects in areas of the sky with low Galactic extinction (red) 
have a larger median redshift and more galaxies in the high 
redshift tail of the distribution. This result suggests that 
we may obtain better Zphot estimates if we include the Ar 
values in the training, which we can do because the training 
data cover the entire range of Galactic extinction found in 
our full sample. This finding implies that Galactic extinction 
may be an important systematic when the clustering of the 
BOSS spectroscopic sample is analyzed. 

We have also found fluctuations in the spectroscopic 
redshift distributions with seeing and sky background; these 
fluctuations will be studied in detail in Ross et al. (in prep.). 
Despite the fact that the training consists of over 100,000 
objects, we do not find that it adequately covers the range 
in sky background or seeing which would be required to in- 
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elude these parameters in the photometric redshift training. 
Instead, we repeat the tests we performed on the whole sam- 
ple in Section 4 on samples in Zphot slices in Section 6. 

5.1 Photometric Redshift Training 

We train ANNz to estimate photometric redshifts for our 
LGs using similar methods as CoUister ct al. (2007). We ran- 
domly divide our sample of CMASS spectra (keeping only 
objects spectroscopically confirmed to be galaxies) such that 
one quarter of the objects are designated as a training set, a 
separate one quarter as a validation set, and the remaining 
half of the objects as a testing set (as is done in CoUister et 
al. 2007) . We use the de-reddened model magnitudes in the 
g, r, i, z SDSS imaging bands, and their errors. We do not 
use the «-band because the results of Schlafly et al. (2010) 
suggest there may be significant variations in the true u — g 
color over the SDSS imaging area, and the u-band only sig- 
nificantly aids Zphot estimation for the bluest galaxies in our 
sample. 

Wc test three different training samples that use the 
following input parameters: 

(i) Only the g, r, i, z (de-reddened) model magnitudes and 
their errors 

(ii) Including the Galactic extinction in the i-band, Ai in 
addition to the model magnitudes 

(iii) Including the ratio of major to minor axes^"*^, a/b, of 
the ellipse corresponding to the best-fit exponential profile in 
the i-band in addition to the magnitude and Ai information 

In case (i), the rms difference between Zspec and Zphot is 
0.0610 for the full sample; this reduces to 0.0512 for galaxies 
with C|| > 1.6. We find that 92% of the LGs pass this cy > 1.6 
restriction which was used by previous studies, such as Col- 
lister et al. (2007). Such a restriction ensures a strong 4000A 
break for galaxies at our target redshifts (0.4 < z < 0.7), 
and Masters et al. (2011) find that this cut removes most 
of the galaxies that would be morphologically classified as 
late-type from the CMASS sample. For case (ii), we find 
that including the Ai information produces insignificant im- 
provements, as the rms values become 0.0609 and 0.0511, 
respectively. Case (iii) significantly improves the rms for ob- 
jects with C|| < 1.6, as the overall dispersion decreases to 
0.0585 and we find a slight improvement for the cy > 1.6 
galaxies, £is the rms decreases to 0.0506. 

As the dispersion values suggest, there is a strong corre- 
lation between the value of cy and the accuracy with which 
we can estimate Zphot- We display the relationship between 
the rms dispersion in our testing set versus cy in the mid- 
dle panel of Fig. 10 for our three different training samples. 
The obtained relationship is extremely similar whether we 
include Ai (case ii, red squares) or we do not (case i, black 
triangles). Including a/b (case iii) reduces the rms for ob- 
jects with Cy < 1.8. However, all three cases show that the 
most accurate Zphot estimates are obtained when cy > 1.8. 

We also find a significant correlation between the accu- 
racy of the Zphot estimates and a/b. Masters ct al. (2011) has 
discovered that ~5% of CMASS objects are edge-on spiral 

^1 The data selected from the CAS are ABg^p, which are actually 
b/a ratios 



galaxies where effects of dust obscuration are likely to be 
significant. Logically, this likelihood is correlated with the 
axis ratio. These dust-obscured spirals tend to be at lower 
redshifts than the majority of CMASS objects (while having 
similar colours, see Yip et al. 2011 for a full study of the ef- 
fects of inclination on photometric redshift estimates), and 
thus the accuracy of the Zphot estimate and a/b are related. 
We present this relation in the right panel of Fig. 10, which 
shows that it is a smooth function of logio(a/6) for each of 
our three training samples. Including a/b in the training im- 
proves the Zphot accuracy for the largest values of a/b. The 
values of a/b are correlated with cy, since it is disk galax- 
ies (which are generally bluer galaxies in cy) that have the 
highest values of a/b. However, we find similar relationships 
(though not as strong) when this correlation is accounted 
for. 

The dispersion is also correlated with the estimated 

Zphot, as illustrated in the left panel of Fig. 10. Each of the 
three training cases result in similar relationships. Includ- 
ing a/b (case iii, blue circles) makes the largest difference 
for Zphot estimates between 0.45 and 0.6. We see only minor 
differences between cases i (model magnitudes only, black 
triangles) and ii (including Ai, red squares). 

The ANNz output includes a photometric redshift er- 
ror estimate, which we denote aze- These reported errors 
are correlated to the actual dispersion in Zphot vs. Zspec, 
but they underestimate it by a factor of ~66% (as can 
be determined by c omparing the average value of a^e and 
•\/ ({zphot — Zspec)^) for any particular testing set). These es- 
timated errors do not recover the trends we discover between 
the rms and cy and a/b (which are displayed in the middle 
and right panels of Fig. 10). Thus, the true uncertainty on 
any individual Zphot estimate is a linear combination of the 
estimated error, cy , and a/b. In section 5.3, we describe how 
we combine this information in order to estimate the red- 
shift distributions of the photometric redshift samples we 
use. 

Fig. 11 presents the overall redshift distribution of spec- 
troscopic galaxies in our testing set (solid black line). The 
colored lines display the spectroscopic redshift distributions 

of testing set galaxies in slices of width Azphot = 0.05 from 
0.4 to 0.7, when wc estimate Zphot using case iii (including Ai 
and a/b in the training sample). In all cases, the dashed lines 
represent galaxies with cy > 1.6. These distributions suggest 
that, if one wishes to use slices of width Azphot = 0.05, the 
bins 0.45 < Zphot < 0.5, 0.5 < Zphot < 0.55, 0.55 < ZpUot < 
0.6, and 0.6 < Zphot < 0.65 contain most of the information, 
as the bins 0.4 < Zphot < 0.45 and 0.65 < Zphot < 0.7 have 
their true redshifts almost entirely within the adjacent Zphot 
slice. 

5.2 Photometric Redshift Catalog 

We construct a photometric redshift catalog using the ob- 
jects selected as described in Section 2, and using the ANNz 
training which includes both Galactic extinction and axis 
ratios (case iii). We did not find any significant difference 
in the accuracy of the Zphot estimates when we added Ai 
information to the training (case ii). However, we do find a 
large difference in the full Zphot distributions. In particular, 
we find that the asymmetry between the North and South 
increases as a function of Zphot- For 0.6 < Zphot < 0.65, 
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^phot C|l logio(a/b) 

Figure 10. The rms between the Zpi^at estimate and the spectroscopic redshift in the BOSS testing set, as a function of the Zpf^^t estimate 
(left panel), cy (middle panel), and the axis ratio for ellipse corresponding to the exponential fit to the i-band profile (right). We display 
results for the three separate spectroscopic redshift samples we use to train ANNz: i) when we train using only the g, r, i, z (de-reddened) 
model magnitudes (black triangles), ii) when we also include the Galactic extinction in the i-band {Ai; Ai = 2.086-E(-B — V); red squares), 
and, iii) when we also include the axis ratio for the exponential fit to the i-band profile (a/fe; blue circles). 
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Figure 11. The redshift distributions of spectroscopic galaxies 
in our testing set, for different Zpf^at selections. The dashed line 
represent galaxies with cy > 1.6. The vertical dotted lines delin- 
eate the Zpfiat bounds of the six Azpi^^t = 0.05 redshift sliccss 
that are displayed. 

C|| > 1.6, and weighting by psg, we find a 7.5% larger num- 
ber density of LGs in the South when we do not include 
Ai values in the training, and only a 4.9% larger number 
density when we do include Ai. 

In Fig. 12 we display the ratio of the number of objects 
in the South to the number of objects in the North as a 
function of the Zphot estimate, in black (we note that at 



high redshift, the red curve overlaps the black curve). The 
dashed black line shows the ratio of the area in South to the 
area in the North (0.347). For Zphot > 0.55 and Zphot < 0.46, 
there is a significant excess in the number of objects in the 
South. When apply the d± > 0.5564 cut to objects in the 
South, there is a significant decrease in the number objects 
in the South with Zphot < 0.46, however, we find almost no 
change at larger photometric redshifts. This is due to the 
fact that objects that are assigned larger Zpjiot have larger 
d± values. In fact, we find linear relationship between the 
average Zphot and d±, given by Zphot = 0.53d_L. Inserting the 
Ad± — 0.0064 ofltset in for objects in the South (as suggested 
by Schlafly & Finkbeiner 2Q10b) yields a bias of Az = 0.0034 
for objects in the South. When we subtract 0.0034 from each 
Zphot in the South, we find that the ratio between the number 
of objects in South and North (the blue curve in Fig. 12) 
becomes nearly constant as a function of Zphot- As we found 
in Section 4.4, we find that assuming a difference in dx of 
0.0064 (and its full consequences), removes the asymmetry 
between the distribution of objects in the North and South. 

Fig. 11 implies that the majority of the cosmological 
information will be located within four Zphot bins 0.45 < 
Zphot < 0.5 (which we denote 1), 0.5 < Zphot < 0.55 (which 
we denote 2), 0.55 < Zphot < 0.6 (which we denote 3), and 
0.6 < Zphot < 0.65 (which we denote 4). The characteristics 
of each bin are summarised in Table 2. The training further 
suggests that we can only obtain accurate Zphot estimates 
for objects with cy > 1.6, thus we also make this cut in each 
bin. 



5.3 Estimating True Redshift Distributions 

To properly analyze any angular clustering measurement, 
one must know the true redshift distribution of the galax- 
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Figure 12. The ratio between the number of objects, when 
weighting by psg, in the South and North, as a function of 
Zpfiot for our fiducial sample (black), when we apply the cut 
d± > 0.5564 to objects in the South (red), and when we ap- 
ply the cut dj^ > 0.5564 and subtract 0.0034 from every Zp^^f for 
objects in the South (blue). The dashed black line displays the 
ratio of area in the South and North. Errors are Poisson. 



Table 2. The characteristics of the four photometric red- 
shift (Zphot) bins we use, where cr^o refers to the value of 
Zphot ~ Zspec)^} in the testing set and a^t refers to the level 
of dispersion we infer for the full data set, using the methods 
described in Section 5.3. 



bin 


Zphot range 








1 


0.45 < z < 0.5 


214,971 


0.0427 


0.0431 


2 


0.5 < z < 0.55 


258,736 


0.0427 


0.0442 


3 


0.55 < z < 0.6 


248,895 


0.0501 


0.0524 


4 


0.6 < 2 < 0.65 


150,319 


0.0601 


0.0633 



ies being used. This task is made relatively simple for the 
photometric redshift catalog we produce, as we expect the 
distributions to be similar to those of the training sample. 
However, the match is not perfect, and blindly assuming 
that the full catalog has the same distribution as the train- 
ing sample would be folly — differences between the cata- 
logs must be accounted for. Based on our testing sample, we 
found that the actual dispersion between the photometric 
and spectroscopic redshift was well correlated to the error 
estimate, but also subject to the values of C|| and a/b. Thus, 
we can compare the distributions of photometric redshifts, 
error estimates, cy, and a/b in the full data set to that of 
the testing set and use this information to estimate the true 
redshift distribution in any Zphot slice. 

For bin 1 (0.45 < Zphot < 0.5), the mean of the Zphot 
error estimated by ANNz, aze, is 0.0224, while it is slightly 
lower, 0.0222, for the testing set (recall that these estimates 
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Figure 13. The estimated (normalized) redshift distributions for 
the four labeled Zpj,ot slices, determined by sampling Gaussians 
around each spectroscopic LG in the testing data, as described in 
Section 5.3. 



underestimate the actual y' {{zphot ~ Zspec)'^) by ~66%). 
The average cy and a/b of the full bin 1 and the testing set 
subset of bin 1 agree within 0.3%. We find the deviations 
in C|| are similarly small for the other three redshift bins. 
Overall this suggests that the true redshift distribution of 
bin 1 is slightly wider than the spectroscopic distribution, 
due to the fact that its (j^e are 1% larger than for the test- 
ing set. Thus, referring to azt as the true dispersion in the 
bin and azo as the dispersion in the testing set, we estimate 
a^t = 0.0431 given that Uzo = 0.0427. In Table 2, we list the 
azo we measure from the testing set and the azt we estimate 
for each photometric redshift bin. 

For bin 2 (0.5 < Zphot < 0.55), the differences are more 
substantial. The value of aze is 3% larger for the full cat- 
alog (0.0268 compared to 0.0260) and the average value of 
logio(a/fe) is 2% larger (0.170 compared to 0.166). Fig. 10 
suggests a linear relationship between the photometric red- 
shift dispersion and logio(a/&) for < logj^o(a/6) < 0.3 that 
is (Tz ~ 0.05 + 0.0671ogj^Q(a/6) suggesting the overall error 
should be 3.5% larger in bin 2 than for the testing set. The 
differences grow larger for bin 3 (0.55 < Zphot < 0.6): The 
value of (Jze is 4% larger (0.0307 compared to 0.0296) than 
the testing set and the average value of logio(a/6) is 5% 
larger (0.180 compared to 0.172), suggesting the errors in 
bin 3 are 4.6% larger than in the testing set. We find similar 
differences for bin 4 (0.6 < Zphot < 0.65: The values of aze 
are 0.0334 and 0.0320 and the average logio(a/&) are 0.185 
and 0.176, and we therefore expect the errors to be 5.4% 
larger than in the testing set. 

To correct for the differences between the testing set 
and full sample, we sample a Gaussian, for each spectro- 
scopic redshift within the photometric redshift bin, of width 
such that the average dispersion in the bin increases to that 
which we expect. The dispersion we expect, at, is related to 
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the dispersion in the testing set, Uzo, and the width of the 
Gaussian, aa, via 

f^t = + fd- (31) 

We are assuming we can relate Uzt = aazo, as described in 
the previous paragraph. Therefore 

= o-zo(a^ - !)• (32) 

For bin 4, a = 1.054 (since we determined the dispersion 
should be 5.4% larger for the full sample than for the test- 
ing set) and a^o = 0.06, which yields ad = 0.02. Wc find 
that, with sufficient sampling, the resulting n{z) arc smooth 
up to steps of 0.001 in Az. We display the n{z) we obtain for 
our four bins in Fig. 13. In general, the displayed n{z) are 
poorly fit by distributions such as a Gaussian or a Lorentzian 
(especially at the tails). Thus, wc use the plotted distribu- 
tions, which are binned in steps of width 0.001 in redshift, 
and interpolate between these points as needed in order to 
obtain the n{z) used in Eq. 27. 



6 CLUSTERING IN PHOTOMETRIC 
REDSHIFT BINS 

6.1 Auto- and Cross-correlation Function 
Measurements 

In order to investigate the photometric redshift catalog, we 
have measured the angular auto- and cross-correlation func- 
tions of galaxies in the four photometric redshift bins sum- 
marized in Table 2. This will allow us to investigate how 
systcmatics affect particular photometric redshift bins and 
if the redshift distributions are the same as suggested by the 
training data. 

Fig. 14 displays the measured auto-correlation functions 
for LGs, multiplied by 6, in our four photometric redshift 
bins with no corrections (black triangles), a correction for 
stars and sky-background (calculated as described in Sec- 
tion 3.1; red squares, 'Cstar,sky'), subtracting the effective 
masked area per star, Astar, for every star in each pixel and 
correcting for sky background (blue circles; ^ — Astar, Caky'), 
apply weights based on the n/ntot{sys) relationships iter- 
atively determined in the order Ustar, airmass, seeing, Ar, 
d± offset, sky (open green triangles; 'Weights'), and apply 
the cut d± > 0.5564 and subtract 0.0034 from each photo- 
metric redshift for objects in South while also applying the 
—Astar,Csky Correction (open orange squares; 'ASouth'). 
Correcting for either seeing or Galactic extinction makes 
negligible difference. For each bin, the displayed error-bars 
are the jack-knife errors. 

Interestingly, only bin 1 and bin 4 show any significant 
effect from sky background; for bins 1 through 4 the val- 
ues of e^ky for Cstar,sky are 0.098, 0.021, 0.034, 0.137. This 
result is consistent with the assertion that the dependence 
on sky background is related to its effect on the magnitude 
errors. The lowest redshift bin should show the largest effect 
from objects scattering around the d±_ > 0.55 cut (as can 
be inferred from the difference between the red and black 
curves in Fig. 12), and the highest redshift bin should show 
the largest effect from objects scattering across the faint 
magnitude limit. 

For each of the measurements displayed in Fig. 14, we 



determine the best-fit bias, b, given our fiducial cosmolog- 
ical model. In each panel of Fig. 14, the black curve dis- 
plays the best-fit model when fit to the measurements with 
the Catar,sky corrcctious applied. We fit to angular scales 
1° < ^ < 20° (the equivalent physical separation for a 1° 
separation is 23.2/i~^Mpc at 2; = 0.5), for which there are 
16 data points in each redshift bin. The best-fit 6 and the 
associated per degree of freedom (x^/dof) are listed for 
each redshift bin and each of the five separate estimations 
of w(9) in Tabic 3. Given that wc arc using a theoretical 
covariance matrix to compare between our measured and 
model w(0), we have 15 dof regardless of the corrections 
that wc apply. The covariance matrices assume there is no 
added covariance due to systcmatics, and the corrections are 
an attempt to remove this covariance and allow the proper 
comparison between measurement and model. 

In every case, applying some form of correction reduces 
the x^/dof compared to the case when no corrections are 
applied. In each redshift bin other than 1, the x'^/dof is 
greater than 3 when no corrections are applied. When cor- 
rections are applied, the x'^/dof are all less than 2, and 
only for bin 2 are they significantly greater than 1. After 
each of the minimum x'^/dof reported in Table 3, wc list, 
in parentheses, the x'^/dof obtained when we fit to a max- 
imum scale of 60° (which is the largest scale to which we 
measure w[6]) and keep the model the same as the best-fit 
for \° < 9 < 20°. This tests the consistency of the 6 mea- 
surements with 20° < e < 60°. Notably, none of the x^/dof 
become significantly worse. 

The most extreme results are obtained when the 
ASouth corrections arc applied. For bin 2, only 0.8% of ii)(0) 
measurements consistent with our fiducial model would have 
a x^/dof greater than the value of 2.1 that we find, while 
for bin 4 we find xVdof = 0.38 and would expect 98.4% 
of w{6) measurements to have a greater value. As isolated 
cases, neither result is particularly remarkable. However, re- 
gardless of the correction technique, the x^/dof for bin 2 
are all greater than 1.8. This result is caused, in large part, 
by the w{6) measurements at ~ 2°, which are considerably 
smaller than the w[6) predicted by our best-fit model. 

In order to subtract the effective area per star, Agtar, 
for every star in each pixel, we use a different value of Tatar in 
each photometric redshift bin. We determine these values by 
fitting for the value of rstar that makes n/ntot{nstar) most 
consistent with one for the objects in the redshift bin (just 
as was done for the full sample as described in Section 4.1). 
Further, we should expect slightly different values of Tatar 
given that Fig 3 shows different relationships for different 
magnitude LGs, and the average magnitudes are different 
in each photometric redshift bin. 

The Astar, Caky (bluc circlcs) and Catar,aky (red 
squares) corrections result in nearly identical measurements. 
Thus the best-fit bias values (as presented in Table 3) differ 
by no more than 1.5%. Interestingly, the jack-knife errors 
are smaller in general when we subtract Aatar — suggesting 
that this action reduces the fluctuations within the sample. 

When we use the Weight method (open green triangles) 
to correct our measurements, the resulting w{8) amplitudes 
are slightly smaller than any of the other measurements. 
This translates to marginally lower (between 1.0% and 2.5% 
compared to the 'Corrections" values) best-fit bias values. 
This suggests that the Weight method may slightly suppress 
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Figure 14. The measured angular auto-correlation functions, w{9), for our photometric redshift bins 0.45 < Zphot < 0.5 (top left), 
0.5 < ^phot < 0.55 (top right), 0.55 < Zpfi^t < 0.6 (bottom left), and 0.6 < Zpft,ot < 0.65 (bottom right), when no corrections are 
applied (black triangles), when corrections for stars and sky background are applied in the manner described by Section 3.3 (red squares; 
Cgtar,sky)y when the effective area of stars, Astar, is removed from each pixel and a correction for sky background is applied (blue 
circles, —Agtar, Cgky)y when iterative weights are applied to the LG density field used to calculate w{8) (green open triangles), and when 
— Astar, Csky is applied to the w(8) of LGs selected such that d± > 0.5564 and 0.0034 is subtracted from Zp/joj for objects in the South 
(orange open squares; ASouth). 



some true fluctuations. The jack-knife errors in the Weight 
case are similar to those determined using the Astar, Caky 
correction — applying the weights to the density field de- 
creases the sample variance. The consistency between the 
best-fit models and the data are also quite similar to the 



Astar, Csky results, cxccpt for bin 4, where the x^/dof de- 
creases significantly. 

Imposing the d± > 0.5564 cut and subtracting 0.0034 
from Zphot for objects in South and applying the Astar, Csky 
correction (ASouth; orange open squares), only makes a sig- 
nificant difference in bin 4. The best-fit bias values change 
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Table 3. The minimum per degree of freedom (x^/dof) and corresponding best-fit bias value (b) obtained when fitting our measure- 
ments at scales between 1° and 20° (for which we have 16 measurements and thus 15 degrees of freedom; numbers in parentheses are the 
X'^/dof when we include the six additional measurements at 9 > 20°) to our fiducial cosmological model, for 5 cases: 1) No corrections 
are applied ('No Corrections'); 2) corrections for stars and sky brightness are applied ('Corrections'); 3) Astar is subtracted for each star 
within a pixel of circular area corresponding to the radius Vstar and a correction for sky background is applied as in case 1 (—Astar)', 
4) we iteratively determine weights to apply to the LG density field, as described in Section 4.3 (Weights); and 5) we apply the cut 
> 0.5564 and subtract 0.0034 from each Zphd when selecting objects from the south and repeat the —Astar procedure (ASouth). 



bin No Corrections 


Corrections 


Astar 


Weights 


ASouth 


X^/dof, h 


x7dof, b 


X^/dof, b, Tstar 


xVdof, b 


xVdof, b 


1 0.99 (1.0), 2.16l° °g 

2 3.9 (3.5), 2.26 ' 

3 7.0 (5.8), 2.62 

4 6.4 (5.7), 2.63 


0.79 (0.74), 2.12±0.07 
1.8 (1.5), 2.08±0.07 

0.99 (1.1), 2.20±0.07 
1.0 (1.0), 2.14±0.10 


0.79 (0.75), 2.12±0.07, 7.56" 
1.8 (1.6), 2.07±0.07, 10.6" 
1.0 (1.1), 2.21±0.07, 12.0" 

0.97 (0.97), 2.17t°-j9, ^O-^" 


0.91 (0.82), 2.08±0.07 
1.9 (1.7), 2.03±0.07 
1.1 (0.97), 2.16±0.07 
0.64 (0.56), 2.12+J5'to 


0.79 (0.70) 2.11±0.07 
2.1 (1.8) 2.10±0.07 
0.97 (1.0) 2.23±0.07 
0.38 (0.43) 2.10+° Qg 
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Figure 15. The difference between the jack-knife error we esti- 
mate, (J jack a-nd the theoretical error we calculate, nth, divided 
by fJjack for the four redshift bins 0.45 < ^phot < 0-5 (black tri- 
angles), 0.5 < Zpiiat < 0.55 (red squares), 0.55 < Zph^t < 0.6 
(blue circles), 0.6 < Zphot < 0.65 (open green triangles). Solid 
lines connect the points representing the jack-knife errors and co- 
variance matrix when no corrections are made, and the dashed 
lines connect the points representing the case where Astar is sub- 
tracted for each star within a pixel of circular area corresponding 
to the radius Tstar- 

by less than 1% compared to the Astar, Csky best-fit values 
for every bin other 4, where find a 2.4% decrease. Again 
compared to the Astar, Csky results, we find a marginal de- 
crease in the x^/dof values for bins 1 and 3, but for bin 2, we 
find a marginal increase. The increase in the x^/dof for bin 
2 is driven mainly by the wifi) measurement at 1.8 degrees. 
For bin 4, the x^/dof decreases by more than 50%, and as 
can be seen in the bottom right panel of Fig. 14, nearly all 
of the measurements at scales > 3° become more consistent 
with the model. 

In all of the bins, applying some form of correction re- 



duces the x^/dof values, and for bins 2 through 4 the correc- 
tions change the reduced x^ by at least 1.8 (and by as much 
as 5.4 for bin 4). Further, the general agreement between 
the different methods of correction suggest they can all be 
applied to recover measurements that more closely represent 
the true clustering of LGs. However, we note that they do 
not recover the exact same results, suggesting there is some 
level of systematic error that must be accounted for when 
similar measurements are used to constrain cosmological pa- 
rameters. For bins 2 and 3, the variation in the best-fit bias 
values is similar to the Icr errors — suggesting the the sys- 
tematic uncertainties introduced by the need for corrections 
are a approximately as large as the statistical uncertainties. 

Considering all of the results, we find the bias of the 
LGs is nearly constant as a function of redshift, with slight 
evidence of a decrease from high to low redshift. This is 
close to what we expect for a sample selected to be approx- 
imately passively evolving. Such a sample of fe ~ 2 galaxies 
will undergo a ~ 4% decrease in bias over the redshift range 
0.475 < z < 0.625 (see, e.g., Fry 1996; Tegmark & Peebles 
1998). More importantly without any corrections, one might 
assume our model of large-scale clustering is grossly in er- 
ror. However, with the corrections, we are given no reason to 
doubt the standard cosmological model. This is consistent 
with the results of Crocce et al. (2011), whose w{6) measure- 
ments, at similar redshifts to our own, are consistent with a 
ACDM model for scales 61 < 5°. 

Figure 15 displays the fractional differences we find be- 
tween the jack-knife errors and the theoretical errors we cal- 
culate, for the four photometric redshift bins we use. The 
solid lines connect the data that represent the case where 
we apply no corrections, while the dashed lines connect the 
data representing the —Astar, Csky corrections. The results 
are noisy, but the jack-knife and theoretical errors are simi- 
lar at scales < 10°, while the theoretical errors are larger at 
greater scales. We note that the agreement is due, in part, 
to the large differences in the best-fit bias between the cases 
where corrections are and are not applied. The jack-knife er- 
rors are larger when no corrections are applied to the density 
field, but the best-fit bias is larger as well (see Table 3), and 
thus the amplitudes of the jack-knife and theoretical errors 
are similar in both cases. In most cases, the jack-knife errors 
are smaller than the theoretical estimate at large scales, but 
this effect is more dramatic when corrections are applied to 
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the density field. One would expect that the jack-knife er- 
rors should under-estimatc the true uncertainty as the scale 
grows larger and the different jack-knife regions thus become 
more correlated. 

Fig. 16 presents the angular cross-correlation functions 
between our four photometric redshift samples, after ap- 
plying corrections for stars and sky background. For com- 
parison, we display model curves where we assume a bias 
equal to the geometric mean of the bias of the two bins 
being cross-correlated (i.e., for 2x3, b = 2.06). In general, 
the amplitudes of the cross-correlations are consistent with 
the redshift distributions determined from our testing set. 
The cross-correlations that are least consistent arc 1x2 and 
2x3. The inconsistency could be due to a number of fac- 
tors. Apart from the redshift distributions being incorrect, 
it is possible that the bias of the galaxies contributing to 
the cross-correlation (e.g., for 2x3, the high redshift edge of 
the 0.5 < Zphot < 0.55 distribution) is lower than for the 
overall sample. This is likely if objects with lower bias also 
have larger photometric redshift errors. The facts that the 
model curves for 1x2 and 2x3 are only marginally outside of 
the 1(7 errors and that the agreement appears excellent for 
the other cross-correlations suggest that there is no signifi- 
cant disagreement. If we do not apply the corrections to the 
measurements, they are greatly divergent from the models. 

6.2 Comparison With MegaZ 

The MegaZ-LRG DR7 catalog (MegaZ hereafter) is a pho- 
tometric redshift catalog similar to our own (Thomas et al. 
2011). It used ANNz to train SDSS DR7 objects with sim- 
ilar color selection to ours and spectroscopic redshifts from 
the 2SLAQ survey (Cannon et al. 2006). Compared to the 
sample used in Thomas et al. (2011), the most notable dif- 
ferences are that they impose a cut idev < 19.8 while the 
BOSS survey uses icmod < 19.9 and the sliding cut defined 
by Eq. 6. Further, 2SLAQ spectra were targeted for objects 
withifibre < 21.4 (our fibre magnitude cut is ifibre2 < 21.7). 

The MegaZ data and its corresponding mask is publicly 
available^'^. Similar to our catalog, there is a photometric 
redshift and a probability that the object is a galaxy (we 
denote this Psqm)- We employ the same cuts on the MegaZ 
catalog as Thomas et al. (2011), the most notable one being 
PsgM > 0.2. The different target selections result in dis- 
crepancies between the overall redshift distributions. These 
disagreements are most extreme for the lowest and highest 
redshift bins. Weighting by psg, our catalog has 20% of its 
objects with 0.45 < Zphot < 0.5, while the MegaZ catalog has 
36% of its data in this redshift bin. The 0.6 < Zphot < 0.65 
photometric redshift bin contains 14% of our data, while 
only 10% of the MegaZ data have photometric redshifts in 
this range. Our overall number density is slightly smaller, 
however, as for 0.45 < Zphot < 0.65, we have a number 
density of 88 deg"^ (872,921 objects over 9,913 deg^) and 
MegaZ has 93 deg"^ (723,556 objects over 7746 deg^). 

The mask provided by MegaZ is in the Healpix format 
at Nsido=1024, and we can therefore calculate w{0) using 
nearly identical methods as described in Section 3.1. We 
calculate w{6) for MegaZ data in the same four photometric 

http://zuserver2.staj:.ucl.ac.uk/~sat/MegaZ/MegaZDR7.tar.gz 



redshift bins as our own catalog (we note these arc also the 
photometric redshift bins used by Thomas et al. 2011). We 
display these measurements, compared to our own, in Fig. 
17. The black triangles display the measurements we obtain 
when we cut objects from the MegaZ catalog with psg < 0.2, 
while the red squares display the results when we correct 
these same MegaZ measurements for stars in the DR7 area, 
using the method described in Section 3.3. The blue circles 
display the measurements, for the full DR8 area, when we 
correct w{9) of our catalog for stars. 

The small scale amplitudes of the MegaZ measurements 
and our own are not directly comparable, because the red- 
shift distributions may be different. However, Thomas et 
al. (2011) find a significantly smaller bias in their 0.45 < 
Zphot < 0.5 bin than their other bins, whereas we find only 
a small difference in the bias of our different photometric 
redshift bins (see Table 1). It therefore makes sense that the 
MegaZ amplitudes are significantly smaller than ours. For 
the middle two redshift bins, the small scale amplitudes of 
the MegaZ measurements and our own are generally consis- 
tent with the displayed jack-knife errors. This result is no 
surprise given that we would expect the (small) differences 
in the target selection to only significantly affect the high- 
est and lowest portions of the redshift distribution. Finally, 
Thomas et al. (2011) found a substantially higher bias in 
the MegaZ 0.6 < Zphot < 0.65 sample to their other bins, 
while for our sample, the bias in this bin is quite similar to 
that of the 0.55 < Zphot < 0.6 bin. Therefore, the MegaZ 
amplitudes are larger than ours for 0.6 < Zphot < 0.65. 

Correcting for stars makes a substantial difference in 
the large-scale clustering of the MegaZ data in the lowest 
and highest redshift bins, but has little effect in the middle 
two bins. Wc note that our measurements show no appre- 
ciable change when we use only the DR7 area, suggesting 
that the differences between the two samples must be due 
to additional systematic effects on either the MegaZ data or 
our own. 

We note that Thomas et al. (2011) measured C'e with 
these MegaZ samples, and found significant deviations in the 
expected clustering (from ACDM models) to be confined to 
the lowest (. bins. For measurements of w{9), the covariance 
between angular scales implies that an excess at low I will 
affect measurements at smaller angular scales. Given that 

wie) = J2 {^^) Pdcose)Ce, (33) 

where Pe are Legendre polynomials, one can determine that 
a 400% excess for £ < 5 and a 50% excess for 5 < ^ < 10 
(similar to the excess found in the 0.6 < Zp^ot < 0.65 bin 
by Thomas et al. 2010b) translate to a 30% larger w{9) at 
^ = 3°. This estimate is roughly consistent with the dif- 
ference between the un-corrected (black triangles) and cor- 
rected (red squares) MegaZ w{0), displayed in the lower- 
right panel of Fig. 17 — i.e., we find the systematic effect of 
stars of the 0.6 < Zphot < 0.65 MegaZ w{d) to be consistent 
with the low £ excess found by Thomas et al. (2010b). 

Failure to apply correction for the systematic effects 
described in this paper would clearly bias the cosmological 
parameters one could determine based on our w{9) mea- 
surements. However, our results suggest that Ce measure- 
ments for ^ > 10 might not result in biased measurements. 
We therefore have no reason to believe that the measure- 
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Figure 16. The measured angular cross-correlation function measurements between our four photometric redshift bins 0.45 < Zp^oj < 0.5 
(1), 0.5 < 2p(,o4 < 0.55 (2), 0.55 < Zp^ot < 0.6 (3), and 0.6 < Zp^ot < 0.65 (4), subtracting the systematic effects of stars and sky 
background via Eqs. 28 - 30. The curves display the theoretical models for bi = 2.08, 62 = 1.96, 63 = 2.16, and = 2.11, where the bias 
used is the geometric mean of the bias for the two bins involved. 



ments of cosmological parameters by Thomas et al. (2011) 
have significant systematic errors associated with them. The 

magnitude of potential systematic errors on the cosmological 
parameters determined with C« spectra is studied in detail 
in Ho et al. (in prep.). 



7 CONCLUSIONS 

We have investigated the systematic effects on the angular 
distribution and spectroscopic/photometric redshift distri- 
butions of objects matching the BOSS CMASS selection. 
We find that not correcting for the foreground presence of 
stars, which effectively mask small areas of the sky, produces 
a systematic error that is (generally) significantly larger than 
the statistical error at scales greater than 3°. The measured 
w{0), after accounting for foreground stars, are generally 
consistent with ACDM predictions, even at the largest scales 
we measure, but are grossly inconsistent (x^/dof as large as 
6.3) when the effects of foreground stars are ignored. Our 
primary results can be summarized: 

• We select objects from the SDSS DR8 CAS, using the crite- 



ria defined for the BOSS CMASS sample, yielding a sample 
of 1,065,823 objects within our masked footprint which are 
matched to 112,778 existing BOSS spectra (see Section 2.1). 
We train ANNz to output a probability that each of these 
objects is a galaxy (see Section 2.2). 

• Stars occult a small area of the sky, reducing the ability 
to detect galaxies in their immediate vicinity (see Fig. 3). 
For our sample, stars with i-band magnitude less than 20.3 
have an effect out to at least 10" (see the top right panel of 
Fig. 3). 

• We account for stellar contamination by weighting every 
object by the probability that it is a galaxy. When doing so, 
we find a strong, negative correlation between the number 
density (in deg~^) of objects in our sample and stars (see the 
red squares in the bottom-left panel of Fig. 4 and the black 
triangles in the bottom panel of Fig. 5), partially explained 
by our finding that stars effectively mask their local area. 

• We correct for the effect of stars on the local density of 
galaxies by assuming each star effectively removes constant 
amount of area. We determine this area as described in Sec- 
tion 4.1. We find that accounting for this area produces a 
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Figure 17. The measured angular auto-correlation functions in four photometric redshift bins, 0.45 < ^p^ot < 0-5 (top-left), 0.5 < 
Zphot < 0.55 (top-right), 0.55 < Zp^ot < 0.6 (bottom-left), and 0.6 < Zphot < 0.65 (bottom right). Black triangles display the results 
using the same catalog (and cuts on it) as Thomas et al. 2010b. Red squares show these MegaZ measurements when they are corrected 
for stars and the blue circles are the measured w{9) using our catalog and correcting for stars, as described in Section 3.3. 



significant change in w{0), especially at large scales (see the 
left-panel of Fig. 7) . 

• We test two methods that can be applied in an attempt to 
correct the systematic errors introduced by any parameter 
that can be turned into a map on the sky. The "Correction" 
technique, first developed for large-scale structure measure- 
ments by Ho et al. (in prep.), is described in Section 3.3. 
When this method is applied to correct for the presence of 
stars, we recover nearly identical results as when we account 



for the effective area of stars (see the left-panel of Fig. 7). 
The "Weights" method is described in Section 4.3. We find 
that applying it to stars, Galactic extinction, seeing, sky 
background, airmass, and Schlafly et al. (2010) offsets in d± 
results in 10(9) measurements that are nearly identical to 
those we obtain applying the Correction method to stars 
and sky background (see the right-panel of Fig. 7). 

• We use ANNz to estimate photometric redshifts for every 
object in our sample. We find that including axis ratios im- 
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proves the accuracy of the photometric redshift estimates in 
our testing set (see Section 5.1), and that including Galactic 
extinction improves their reliability when applied to the full 
sample (see Section 5.2). 

• Wc find an asymmetry in the density of objects in the 
North and South Galactic caps, which is removed (to within 
0.1%) when we account for the 0.0064 difference in d± 
between the North and South discovered by Schlafly & 
Finkbeiner (2010b) (see Section 4.4). This offset in d± im- 
plies that the photometric redshift estimates in the South 
are biased by 0.0034 compared to the North. When we cor- 
rect for this bias, we find that the ratio of the number of 
galaxies in the South to the number in the North is ap- 
proximately constant and consistent with the ratio of their 
areas for 0.46 < Zphot < 0.65 (see Fig. 12). Our w{9) mea- 
surements for the full sample appear similar to a weighted 
average of the w{6) calculated separately for the North and 
South (sec Fig. 8). 

• We divide our photometric redshift catalog into four pho- 
tometric redshift slices between 0.45 < Zphot < 0.65, a.s sum- 
marized in Table 2. We measure w{0) for each slice, applying 
the various techniques we developed to correct systematic 
errors (sec Fig. 15). Wo calculate the bias in each sample 
when using each of the techniques, assuming the same fidu- 
cial ACDM model, the results of which are summarized in 
Table 3. 

•We find that the magnitude of the corrections are larger 

than the statistical error for Zphot > 0.5 and ^ > 3° and 
that applying some form of correction significantly reduces 
the minimum x^/dof. 

•We find scatter in the best-fit bias values that is similar to 
their la uncertainty, suggesting that the systematic error in- 
troduced by the need for corrections is of similar magnitude 
to the statistical uncertainty. 

The presence of foreground stars must be accounted for 
in any study of large-scale clustering including the 3D 
clustering of the BOSS spectroscopic data. Further, simi- 
lar tests to those presented here will likely be necessary for 
the radial distribution of BOSS LGs and its impact on the 
measured clustering. 

The results of our study have strong implications 
for future photometric redshift surveys (such as DES, 
PanSTARRs, and LSST). We were able to extensively in- 
vestigate potential sources of systematica because we are 
determining Zphot estimates in the most ideal of cases: our 
training sample covers a large area, is representative and 
~10% as large as our full catalog. Further, we can include 
Galactic extinction values in the training because the train- 
ing sample covers a range representative of our full sample. 

As documented throughout, it is not just the accuracy 
of the Zphot estimate that is important, but the probabil- 
ity that an object is a galaxy as well. Reliable probabilities 
that an object is a galaxy are crucial for disentangling the 
systematic and contamination effects of stars on the density 
field of LGs. Even though our training set is quite large, the 
relationship shown in Fig. 2 is fairly noisy, implying that 
identifying robust methods of assigning probabilities that 
objects are galaxies should be a major focus of forthcoming 
photometric redshift surveys. 

Our ideal conditions may not be replicated often in the 
future (though the photometry should be much better) and 
many photometric redshifts will be determined via extrap- 



olation of spectral templates. Robust and exhaustive explo- 
ration of potential systematics will be difficult under such 
circumstances, yet extremely important, and their errors are 
likely to dominate at large scales. It is encouraging that 
Thomas et al. (2011) find very similar best-fit cosmological 
values when they separately use ANNz and various template 
based methods to determine their photometric redshifts. 

Finally, the results we present suggest that the major 
systematic effects on our photometric redshift catalog of LGs 
have been identified and can be corrected for, allowing for 
robust cosmological measurements. Ho et al. (in prep.) show 
how the same systematics can be accounted for using the 
angular power spectrum, and present the cosmological con- 
straints obtained using the catalog of galaxies whose cre- 
ation is described in this paper. 
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Figure Al. Top panel; The measured angular auto-correlation function of all objects in our LG catalog, with equal weighting for every 
object, multiplied by 9. The red squares display the result when we correct for both the contamination and systematic effects of stars. 
The black line displays the measurements, corrected for the systematic effect of stars, calculated using the p^g weighting. Bottom panel: 
The value of e, as a function of 9, based on the cross-correlation function of the un-weighted LG data and the stars. The solid line 
displays the e we calculated from the psg weighted cross-correlation. 



APPENDIX A: STELLAR CONTAMINATION 

Studies such as Myers et al. (2006) have previously described methods for identifying and correcting for stellar contamination. 
However, Fig. 3 and the left panel of Fig. 4 show that the effect of stars on the density field of LGs is two-fold: 1) some of 
the objects are stars and these are thus a contaminant and 2) the presence of stars systematically affects the number density 
of objects. Accounting for both effects makes Eqs. 28 - 30 more complicated. Given some fraction of the objects that are 
galaxies, fg, and some fraction that are stars /s they become: 

5°g^ fg{&l + e5,) + fj,,, (Al) 

wlie) = w°{e)/fg - wsie)e^ - wsc{e){Mfgf - wsMWs/h, (A2) 

and 

'Wg,s{d) - fs'Ws,sciO) 



(A3) 



where 5sc is the over-density of stars that act as contaminants and Ss is the over-density of all stars. The sum of psg for the full 
catalog suggests that 95.9% of the objects in the catalog are galaxies. Our training data imply that ~ 1.2% of these objects 
are quasars. We therefore assume fg — 0.96 and fa ~ 0.03. We measure ■w{9) and the cross-correlation between LGs and stars 
using our full catalog, equally weighting each object. If we assume that 5s = Ssc and determine e vias Eq. A3, we find the 
result plotted in black triangles the bottom panel of Fig. Al. The result is extremely close to the value of e we obtain when 
we weight by pag, which is plotted with a solid black line. 

The top panel of Fig. Al displays the 1^(9) measured without any corrections or psg weights (black triangles). The red 
squares display the result when we correct for the contamination and systematic effects of stars. This yields similar results as 
when we correct the psg weighted measurements for the systematic effect of stars (solid line), but the un-weighted correction 
is systematically smaller than what would be required for the un-weighted and weighted results to agree. The disagreement 
is likely due to the fact that we are assuming 5 a = Sac and that we know fa- While we can be confident in the value of fg 
by summing pag, the fact that some of the objects are quasars implies we do not have an estimate of fa- Further, it is quite 
possible that the stars that are mistaken for LGs and are in our sample, have a different distribution than the full distribution 
of stars. Thus, we are making a number of (possibly incorrect) assumptions. 

The approach we adopt in this paper is to weight each LG by its value of Pag - Assuming that the pag values are accurate, 
this weighting effectively makes /g = 1 (and fa =0), considerably simplifying the situation. We no longer have to worry about 
the percentage of objects that are quasars or if the distribution of contaminant stars is different from the full distribution of 
stars. When there is stellar contamination, the cross-correlation between the observed LG density field and that of the stars 
will be ~ fgeWa{0) + faWa{6). The auto-correlation function of stars is positive and their cross-correlation function with LGs 
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is negative (implying e is negative). Therefore, when there is stellar contamination, one is likely to measure a cross-correlation 
that is ~ 0. This would make one (incorrectly!) assume that stars have no effect on the measured auto-correlation function of 
the LGs. Note that a cross-correlation of ~ 0, when one knows that stellar contamination exists, implies that there must be a 
systematic effect of the stars on the density field (since the auto-correlation of the stars is non-zero). We strongly recommend 
that, given reliable probabilities, one weights an objects by the probability that it is a galaxy when measuring auto-correlation 
functions. 



APPENDIX B: SYSTEMATIC CORRECTIONS 

We display the solutions for based on Eqs. 29 and 30 (as first derived by Ho et al. 2011) for up to three systcmatics: 

_ Wg,3iW2wl - WlWi_2) - W9,lM)l,3(«'2Wl - ^1^2) + ^ Wl,3Wl,2W9,2 - Wl,3lJJg,lwf^2 " ^1^2,3^3,2 -|- WiM;2,3W1, 2^9,2 
W3W2WI — WlWswl 2 + 2W2,3Wl,2Wl,3Wl -|- wf 3W1W2 — 2wf gWf 2 ~ ™2 3'"'l 



Wl 

£2 



W1UI2 — Utj 2 



^9,2 (Wg.l — £3^2,3) — £3^2,3 

Wl 



(B2) 



£1 = — [Wg,l - £2Wl,2 - £3^1,3] (B3) 
Wl 

of systematic i and j (and is an auto-correlation function when i = j). For only one systematic, the result is simple; one 
simply subtracts e^Wsys{9) from the measured LG w{6). 

We note that the solutions we present arc implicitly dependent on 9, but e, as defined, is a constant (Eq. 28 docs not 
allow for a 9 dependence). If the measured value of £ changes depending on 9, higher order corrections may be necessary. 
In this work, we did not find that the error-bars on e (as displayed in Fig. 6) warranted applying higher-order corrections. 
However, the value of t for the seeing does appear to have a strong dependence on the angular scale, suggesting that if the 
errors were smaller, higher-order corrections could become necessary. 



